A Literature Review and Discussion of Malay Rule - Based Affix Elimination Algorithms
نویسندگان
چکیده
Abstrak Stemming is one of the techniques in natural language processing that is used to reduce a word to its root. Information retrieval and knowledge management can further be improved by improving the stemming process. There are four strategies that are being used widely in stemming that includes table lookup, rule-based affix elimination, successor variety and n-gram. However , not all of these strategies are being applied in Malay stemming algorithm. The well-known strategy used in stemming Malay text documents is called a rule-based affix elimination algorithm. In this paper, several Malay stemming algorithms will be discussed such as Othman's algorithm, Sembok's algorithm, Idris's algorithm, Rule Frequency Order Stemmer and Mangalam's algorithm. This paper also discusses some of the improvements made by researchers based on previous Malay stemming algorithm and this provides the current trend of Malay stemming algorithm. Different morphologies rules also being applied in different Malay stemming algorithms. Based on this review paper, it can be concluded that there are a lot of works related to the arrangement of the morphologies rules are conducted. However, this stemming process can still be improved by applying certain background knowledge such as root words dictionaries that can be used for checking the word during the process of eliminating affix words.
منابع مشابه
A Minimally-Supervised Malay Affix Learner
This paper presents a minimally-supervised system capable of learning Malay affixation. In particular, the algorithm we describe focuses on identifying p-similar words, and building an affix inventory using a semantic-based approach. We believe that orthographic and semantic analyzes play complementary roles in extracting morphological relationships from text corpora. Using a limited Malay corp...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملAdaptive Rule-Base Influence Function Mechanism for Cultural Algorithm
This study proposes a modified version of cultural algorithms (CAs) which benefits from rule-based system for influence function. This rule-based system selects and applies the suitable knowledge source according to the distribution of the solutions. This is important to use appropriate influence function to apply to a specific individual, regarding to its role in the search process. This rule ...
متن کاملTowards a Malay Derivational Lexicon: Learning Affixes Using Expectation Maximization
We propose an unsupervised training method to guide the learning of Malay derivational morphology from a set of morphological segmentations produced by a naı̈ve morphological analyzer. Using a morphology-based language model, we first estimate the probability of a given segmentation. We train the model with EM to find the segmentation that maximizes the probability of each morpheme. We extract t...
متن کاملA Novel Method for Selecting the Supplier Based on Association Rule Mining
One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013